Building Harnesses – From Agent to Autonomous Runtime¶
The Fork in the Road¶
Every framework reaches a point where single-purpose agent building diverges from autonomous runtime building. This is that fork.
adk-fluent
│
┌─────────────┼─────────────┐
│ │ │
Single Agent Workflow Harness
Agent("x") A >> B >> C H.workspace()
.instruct() .step() H.event_bus()
.tool(fn) .branch() H.budget_monitor()
.ask(prompt) .build() H.repl()
│ │ │
One question Multi-step Autonomous
one answer pipeline coding runtime
Building a single agent is declarative — you describe what the agent is. Building a harness is imperative — you wire up how it runs. The core framework (Agent, Pipeline, FanOut, Loop) handles the first path. The H namespace handles the second.
A harness is what turns a single-purpose agent into an autonomous coding runtime — the kind of thing Claude Code, Gemini CLI, or Cursor’s agent mode is built on. It’s the difference between “an LLM that can answer questions” and “an LLM that can read your codebase, edit files, run tests, and self-correct.”
adk-fluent doesn’t ship a harness. It ships the building blocks so you can build your harness, tuned to your domain, your security model, your tools.
The Five Layers¶
Every production harness has five layers. Skip one and your harness will be fragile in a specific, predictable way.
┌──────────────────────────────────────────────────────┐
│ 5. RUNTIME REPL, event loop, rendering │
│ 4. OBSERVABILITY EventBus, tape, hooks, renderer │
│ 3. SAFETY Permissions, sandbox, budgets │
│ 2. TOOLS Workspace, web, MCP, git, tasks │
│ 1. INTELLIGENCE Agent + skills + manifold │
└──────────────────────────────────────────────────────┘
Layer |
What it does |
What breaks without it |
|---|---|---|
Intelligence |
LLM + expertise + capability discovery |
Agent doesn’t know how to do the task |
Tools |
File I/O, shell, search, web, git |
Agent can think but can’t act |
Safety |
Permissions, sandboxing, token budgets |
Agent can act but might destroy things |
Observability |
Events, logging, hooks, replay |
You can’t debug when things go wrong |
Runtime |
REPL, streaming, compression, interrupts |
No way to interact with the agent |
Layer 1: Intelligence¶
Start with an agent that has domain expertise. Skills load from SKILL.md files — they’re cached in static_instruction and never re-sent to the LLM on every turn.
from adk_fluent import Agent, H
agent = (
Agent("coder", "gemini-2.5-pro")
.use_skill("skills/code_review/")
.use_skill("skills/python_best_practices/")
.instruct("You are an expert coding assistant. Help the user.")
)
For dynamic capability discovery (the manifold pattern), see Manifold Guide.
Layer 2: Tools¶
The H namespace provides sandboxed tool factories. Combine them with +:
tools = (
H.workspace("/project", diff_mode=True, multimodal=True)
+ H.web()
+ H.git_tools("/project")
+ H.processes("/project")
+ H.notebook("/project")
)
agent = agent.tools(tools)
What each tool set provides¶
Factory |
Tools |
Purpose |
|---|---|---|
|
read, edit, write, glob, grep, bash, ls |
Core file/shell operations |
|
web_fetch, web_search |
URL fetching and search |
|
git_status, git_diff, git_log, git_commit, git_branch |
Version control |
|
start_process, check_process, stop_process |
Background dev servers |
|
read_notebook, edit_notebook_cell |
Jupyter notebooks |
|
(dynamic) |
MCP server tools |
Workspace options¶
H.workspace(
"/project",
allow_shell=True, # Enable bash tool
allow_network=True, # Allow network from shell
read_only=False, # Disable edit/write
streaming=True, # PTY-based streaming bash
diff_mode=True, # Preview edits as diffs before applying
multimodal=True, # Read images/PDFs as base64
max_output_bytes=100_000,
)
Layer 3: Safety¶
Three concerns: who can call what (permissions), where can they write (sandbox), and how much can they spend (budgets).
Permissions¶
Compose policies with .merge(). Deny wins over ask wins over allow.
permissions = (
H.auto_allow("read_file", "glob_search", "grep_search", "list_dir")
.merge(H.ask_before("edit_file", "write_file", "bash"))
.merge(H.deny("rm_rf"))
)
Pattern-based rules for large tool sets:
permissions = (
H.allow_patterns("read_*", "list_*", "grep_*")
.merge(H.deny_patterns("*_delete", "*_destroy"))
)
Sandbox¶
Confines file operations to the workspace directory. Symlink-safe — resolves real paths before checking containment.
sandbox = H.sandbox(
workspace="/project",
allow_shell=True,
allow_network=True,
read_paths=["/usr/share/dict"], # Additional readable paths
write_paths=["/tmp/agent-work"], # Additional writable paths
)
Token budgets¶
BudgetMonitor tracks cumulative tokens and fires callbacks at thresholds. It does NOT compress — it triggers your handler, which decides what to do.
def on_budget_warning(monitor):
print(f"Warning: {monitor.utilization:.0%} budget used, "
f"~{monitor.estimated_turns_remaining} turns remaining")
def on_budget_critical(monitor):
# Switch to aggressive compression
print(f"Critical: compressing context")
monitor.adjust(monitor.current_tokens // 2)
monitor = (
H.budget_monitor(200_000)
.on_threshold(0.7, on_budget_warning)
.on_threshold(0.9, on_budget_critical)
)
agent = agent.after_model(monitor.after_model_hook())
Wiring safety with .harness()¶
The .harness() method bundles all safety concerns and wires the callbacks:
agent = agent.harness(
permissions=permissions,
sandbox=H.workspace_only("/project"),
usage=H.usage(cost_per_million_input=2.50, cost_per_million_output=10.0),
memory=H.memory("/project/.agent-memory.md"),
on_error=H.on_error(retry={"bash", "web_fetch"}, skip={"glob_search"}),
)
Layer 4: Observability¶
The EventBus is the backbone. Everything subscribes to it instead of building its own observation layer.
bus = H.event_bus()
# SessionTape records everything to JSONL
tape = bus.tape()
# Hooks fire shell commands on events
hooks = (
H.hooks("/project")
.on_edit("ruff check {file_path}")
.on_error("notify-send 'Agent error: {error}'")
.on("turn_complete", "./scripts/post-turn.sh")
)
bus.hooks(hooks)
# Wire the bus into agent callbacks
agent = (
agent
.before_tool(bus.before_tool_hook())
.after_tool(bus.after_tool_hook())
.after_model(bus.after_model_hook())
)
Per-tool error recovery¶
ToolPolicy gives each tool its own error handling — retries with backoff for transient failures, graceful skips for non-critical tools, user escalation for dangerous operations:
policy = (
H.tool_policy()
.retry("bash", max_attempts=3, backoff=1.0)
.retry("web_fetch", max_attempts=2, backoff=0.5)
.skip("glob_search", fallback="No matching files found.")
.ask("edit_file", handler=user_approval_fn)
.with_bus(bus) # Emits error events
)
agent = agent.after_tool(policy.after_tool_hook())
Rendering events¶
Renderers convert events to display strings. They don’t handle I/O — you write to your output:
renderer = H.renderer("rich", show_timing=True, show_args=False)
# In your event loop:
for event in events:
line = renderer.render(event)
if line:
print(line)
Layer 5: Runtime¶
Interactive REPL¶
repl = H.repl(
agent.build(),
dispatcher=H.dispatcher(bus=bus),
hooks=hooks,
compressor=H.compressor(threshold=100_000),
config=ReplConfig(
prompt_prefix="coder> ",
welcome_message="Ready. Type /help for commands.",
auto_checkpoint=True, # Git checkpoint before destructive tools
),
)
await repl.run()
Slash commands¶
cmds = H.commands()
cmds.register("clear", lambda args: "Context cleared.", description="Clear context")
cmds.register("model", lambda args: set_model(args), description="Switch model")
cmds.register("undo", lambda args: git_checkpoint.restore(), description="Undo last change")
cmds.register("compact", lambda args: compress(), description="Compress context")
Interrupt and resume¶
Cooperative cancellation — the token is checked before each tool call:
from adk_fluent._harness import make_cancellation_callback
token = H.cancellation_token()
agent = agent.before_tool(make_cancellation_callback(token))
# In your UI thread:
token.cancel() # Interrupt
snapshot = token.snapshot # Mid-turn state
resume_prompt = snapshot.resume_prompt() # "Resuming: Fix the bug..."
token.reset() # Ready for next turn
Conversation forking¶
Branch session state for parallel exploration:
forks = H.forks()
# Save current state
forks.fork("conservative", current_state)
forks.fork("aggressive", current_state)
# Compare approaches
diff = forks.diff("conservative", "aggressive")
# Merge: take conservative approach but include aggressive findings
merged = forks.merge("conservative", "aggressive", strategy="prefer", prefer="conservative")
Background tasks¶
TaskLedger bridges dispatch()/join() to LLM-callable tools:
ledger = H.task_ledger().with_bus(bus)
agent = agent.tools(ledger.tools()) # [launch_task, check_task, list_tasks, cancel_task]
Putting It All Together¶
Here’s a complete Claude-Code-class harness in ~50 lines:
from adk_fluent import Agent, H, C
from adk_fluent._harness import ReplConfig, make_cancellation_callback
project = "/path/to/project"
# --- EventBus backbone ---
bus = H.event_bus()
tape = bus.tape()
# --- Intelligence ---
agent = (
Agent("coder", "gemini-2.5-pro")
.use_skill("skills/code_review/")
.use_skill("skills/python_best_practices/")
.instruct("You are an expert coding assistant.")
.context(C.rolling(n=20, summarize=True))
)
# --- Tools ---
agent = agent.tools(
H.workspace(project, diff_mode=True, multimodal=True)
+ H.web()
+ H.git_tools(project)
+ H.processes(project)
+ H.task_ledger().with_bus(bus).tools()
)
# --- Safety ---
agent = agent.harness(
permissions=(
H.auto_allow("read_file", "glob_search", "grep_search", "list_dir")
.merge(H.ask_before("edit_file", "write_file", "bash"))
),
sandbox=H.workspace_only(project),
usage=H.usage(cost_per_million_input=2.50, cost_per_million_output=10.0),
memory=H.memory(f"{project}/.agent-memory.md"),
on_error=H.on_error(retry={"bash"}, skip={"glob_search"}),
)
# --- Observability ---
token = H.cancellation_token()
monitor = H.budget_monitor(200_000).on_threshold(0.9, lambda m: print("Compressing...")).with_bus(bus)
agent = (
agent
.before_tool(bus.before_tool_hook())
.before_tool(make_cancellation_callback(token))
.after_tool(bus.after_tool_hook())
.after_model(bus.after_model_hook())
.after_model(monitor.after_model_hook())
)
# --- Runtime ---
repl = H.repl(
agent.build(),
hooks=H.hooks(project).on_edit("ruff check {file_path}"),
compressor=H.compressor(100_000),
)
await repl.run()
Design Philosophy¶
Why not a HarnessAgent class?¶
Because every harness is different. A code review harness needs diff-mode editing and git tools but no web access. A research harness needs web tools but no file editing. A DevOps harness needs process management and MCP servers but no notebook support.
A monolithic HarnessAgent would either:
Include everything (slow, insecure, wasteful)
Include nothing (then it’s just an Agent)
Make you disable features (negative configuration — worse DX than assembling what you need)
The H namespace gives you building blocks. You compose what you need. The composition is the configuration.
Why separate EventBus, ToolPolicy, BudgetMonitor, TaskLedger?¶
These are the four foundations that prevent the common pitfall of reinventing framework capabilities at the harness level:
Foundation |
What it prevents |
Composes with |
|---|---|---|
|
Building separate observation layers in each module |
SessionTape, HookRegistry, Renderer |
|
Reimplementing per-tool retry/fallback logic |
|
|
Reimplementing token tracking outside |
|
|
Reimplementing task tracking outside |
Core dispatch primitives |
They exist because the core framework (S, C, M, T, G) is declarative and agent-scoped — perfect for building agents, but harness building needs imperative, session-scoped control. These four primitives bridge that gap.
When to take the harness fork¶
You need a harness when your agent needs to:
Persist across turns — multi-turn conversation with memory, not one-shot Q&A
Use dangerous tools — file editing, shell execution, git operations require permissions and sandboxing
Self-correct — read errors, retry, adjust approach without human intervention
Stay within bounds — token budgets, cost limits, time constraints
Be observable — you need to know what happened, replay sessions, fire hooks
If your agent just answers questions or runs a pipeline, stick with Agent + workflows. The moment it needs to act autonomously in the world — reading codebases, editing files, running tests, managing processes — that’s when you cross the fork into harness territory.
The cookbook proof¶
See examples/cookbook/79_coding_agent_harness.py for a complete, tested, runnable Claude-Code-class harness built entirely from these building blocks. 27 tests, all 5 layers, all 4 foundation primitives wired together.