Steering Modes

How the SDK delivers phase instructions and per-turn context to the model during a Live session. This is the single most impactful configuration choice for multi-phase voice applications.

The Three Modes

ContextInjection (recommended)

The system instruction is set once at connect time and never updated. Phase instructions and per-turn modifiers are delivered as model-role context turns via send_client_content.

Live::builder()
    .model(GeminiModel::Gemini2_0FlashLive)
    .instruction("You are a restaurant reservation assistant at Sapore d'Italia.")
    .steering_mode(SteeringMode::ContextInjection)
    .phase("greeting")
        .instruction("Welcome the guest warmly and ask how you can help.")
        .done()
    .phase("booking")
        .instruction("Help the guest find an available time slot.")
        .done()
    .initial_phase("greeting")

What happens on phase transition:

The phase instruction ("Welcome the guest...") is sent as a model-role content turn
Per-turn modifiers (with_context, with_state, when) are also sent as model-role turns
The system instruction ("You are a restaurant...") is never touched

When to use: Most multi-phase voice apps. The base persona stays stable across phases, and phase-specific behavior is guided through conversational context. Lower latency, no instruction re-processing spikes.

InstructionUpdate (default)

The system instruction is replaced on every phase transition. Per-turn modifiers are baked into the instruction text.

Live::builder()
    .model(GeminiModel::Gemini2_0FlashLive)
    .instruction("You are a helpful assistant.")
    .steering_mode(SteeringMode::InstructionUpdate)  // this is the default
    .phase("receptionist")
        .instruction("You are a medical receptionist. Schedule appointments.")
        .done()
    .phase("triage_nurse")
        .instruction("You are a triage nurse. Assess symptom severity.")
        .done()
    .initial_phase("receptionist")

What happens on phase transition:

The entire system instruction is replaced with the new phase's instruction
Per-turn modifiers are appended to the instruction text
The model re-processes its full context with the new instruction

When to use: When phases represent genuinely different personas or roles. The model needs a complete context reset to shift behavior convincingly.

Hybrid

System instruction is replaced on phase transition (like InstructionUpdate), but per-turn modifiers are delivered as model-role context turns (like ContextInjection).

Live::builder()
    .steering_mode(SteeringMode::Hybrid)
    .phase("sales")
        .instruction("You are a sales representative.")
        .with_context(|s| format!("Customer budget: {}", s.get::<String>("budget").unwrap_or_default()))
        .done()
    .phase("support")
        .instruction("You are a technical support engineer.")
        .with_context(|s| format!("Ticket: {}", s.get::<String>("ticket_id").unwrap_or_default()))
        .done()

When to use: When you need persona shifts on transition but also want lightweight per-turn context updates within each phase. Uncommon in practice -- pick ContextInjection or InstructionUpdate unless you have a specific reason for both.

Decision Matrix

Question	Yes	No
Does the model's core persona change between phases?	`InstructionUpdate`	`ContextInjection`
Is latency on phase transitions a concern?	`ContextInjection`	Either works
Do you need per-turn dynamic context (state summaries, conditional hints)?	`ContextInjection` or `Hybrid`	`InstructionUpdate` is fine
Are phases just different stages of the same conversation?	`ContextInjection`	--
Are phases genuinely different agents (receptionist vs doctor)?	`InstructionUpdate`	--

Anti-Patterns

Using InstructionUpdate for minor context changes

Problem: Every phase has the same persona but slightly different focus areas. Using InstructionUpdate causes unnecessary instruction re-processing latency on each transition.

// Anti-pattern: same persona, different focus -- InstructionUpdate is overkill
Live::builder()
    .steering_mode(SteeringMode::InstructionUpdate)  // unnecessary latency
    .phase("gather_name")
        .instruction("You are a restaurant host. Ask for the guest's name.")
        .done()
    .phase("gather_party_size")
        .instruction("You are a restaurant host. Ask for the party size.")
        .done()

Fix: Use ContextInjection. The base persona is set once, and phase-specific focus is delivered as context turns.

// Better: stable persona, lightweight phase steering
Live::builder()
    .instruction("You are a friendly host at Sapore d'Italia.")
    .steering_mode(SteeringMode::ContextInjection)
    .phase("gather_name")
        .instruction("Ask for the guest's name for the reservation.")
        .done()
    .phase("gather_party_size")
        .instruction("Ask how many guests will be dining.")
        .done()

Using ContextInjection when personas differ radically

Problem: Phases represent genuinely different agent personas (e.g., switching from a receptionist to a clinical nurse). Context injection is too subtle -- the model may not fully shift behavior.

// Anti-pattern: radically different personas via context injection
Live::builder()
    .instruction("You work at a medical clinic.")
    .steering_mode(SteeringMode::ContextInjection)  // too subtle for persona shift
    .phase("receptionist")
        .instruction("You are the front desk receptionist. Be warm and administrative.")
        .done()
    .phase("triage")
        .instruction("You are a clinical triage nurse. Be precise and medical.")
        .done()

Fix: Use InstructionUpdate so the model gets a clean persona reset.

Over-engineering with Hybrid

Problem: Using Hybrid when ContextInjection alone would suffice. Adds complexity without benefit.

// Anti-pattern: Hybrid when the persona doesn't actually change
Live::builder()
    .steering_mode(SteeringMode::Hybrid)  // unnecessary complexity
    .phase("greeting").instruction("Welcome the user.").done()
    .phase("main").instruction("Help with their request.").done()

Fix: Use ContextInjection. If the persona is stable, there's no reason to replace the system instruction.

Putting volatile state in the base instruction

Problem: The base instruction (set at connect time) includes dynamic state that changes every turn. With ContextInjection, this instruction is never updated.

// Anti-pattern: dynamic content in the base instruction
Live::builder()
    .instruction(format!("You are helping {}. Their order has {} items.",
        customer_name, order_count))  // stale after the first turn
    .steering_mode(SteeringMode::ContextInjection)

Fix: Keep the base instruction static. Use with_context() modifiers for dynamic state.

// Better: static base, dynamic context via modifiers
Live::builder()
    .instruction("You are a helpful order assistant.")
    .steering_mode(SteeringMode::ContextInjection)
    .phase_defaults(|d| d.with_context(|s| {
        format!("Customer: {}. Items in order: {}.",
            s.get::<String>("customer_name").unwrap_or_default(),
            s.get::<u32>("order_count").unwrap_or(0))
    }))

How It Works Under the Hood

The three-lane processor evaluates steering at two points in the turn lifecycle:

  TurnComplete event
       |
  [Step 7]  Phase machine evaluates transitions
       |    --> if transition fires, resolved_instruction is set
       |
  [Steps 7d/7e/7f/12/13] Context accumulation
       |    --> tool advisory, repair nudge, steering modifiers,
       |        phase instruction, on_enter_context all push into
       |        a single context_buffer (Vec<Content>)
       |
  [Step 14] Batched context send
       |    --> ONE send_client_content(context_buffer, false)
       |    --> eliminates burst of separate WebSocket frames
       |
  [Step 14b] prompt_on_enter (triggers model response)
       |    --> send_client_content([], true) — separate frame

Batched delivery: All model-role context turns are accumulated into a single Vec<Content> and sent as one atomic WebSocket frame. This eliminates the burst of 3-5 separate send_client_content calls that could confuse the model or clash with concurrent user input.

The key insight: with ContextInjection, step 12 sends the phase instruction as Content::model(instruction_text). The model sees it as its own prior speech, which naturally steers its behavior without the overhead of system instruction replacement.

Context Delivery Timing

By default, the batched context frame is sent immediately during TurnComplete processing (ContextDelivery::Immediate). For voice apps where isolated WebSocket frames during silence can cause glitches, use ContextDelivery::Deferred:

Live::builder()
    .steering_mode(SteeringMode::ContextInjection)
    .context_delivery(ContextDelivery::Deferred)
    .phase("greeting")
        .instruction("Welcome the guest")
        .done()
    .initial_phase("greeting")

How deferred delivery works:

During TurnComplete, context turns are pushed into a PendingContext buffer (instead of sent)
The DeferredWriter wraps the session writer at the LiveHandle level
When user code calls handle.send_audio(), send_text(), or send_video(), the writer drains the buffer and sends the context immediately before the user content
The context arrives in the same burst as user input — no isolated frames during silence

When context is sent immediately regardless:

If a prompt is needed (prompt_on_enter: true or a repair nudge on the first attempt), the context is sent immediately — you can't defer a prompt because the model needs to respond now.

  Deferred delivery:                    Immediate delivery:

  TurnComplete                          TurnComplete
       |                                     |
  [context → PendingContext]            [context → wire now]
       |                                     |
  ... silence ...                       ... silence ...
       |                                     |
  User speaks                           User speaks
       |                                     |
  DeferredWriter.send_audio()           SessionHandle.send_audio()
  1. flush PendingContext               1. send audio
  2. send audio

Interaction with Other Features

Feature	InstructionUpdate	ContextInjection	Hybrid
`with_context(fn)`	Appended to instruction text	Sent as model-role turn	Sent as model-role turn
`with_state(&[keys])`	Baked into instruction	Sent as model-role turn	Sent as model-role turn
`when(pred, text)`	Baked into instruction	Sent as model-role turn	Sent as model-role turn
`instruction_amendment`	Appended to instruction	Appended to context turn	Appended to instruction
`instruction_template`	Replaces instruction	Sent as context turn	Replaces instruction
`navigation()`	Baked into instruction	Baked into instruction	Baked into instruction
`greeting()`	Works normally	Works normally	Works normally
`prompt_on_enter`	Works normally	Works normally	Works normally
`enter_prompt`	Works normally	Works normally	Works normally

gemini-rs