How to properly retrieve context in an OpenAI Realtime conversation using response.create

Summary

A misconfigured OpenAI Realtime response.create call caused the model to ignore the custom instructions and instead continue the default assistant turn. The system never received the prior conversation context because the request did not explicitly reference the correct conversation state, leading to repeated assistant messages instead of the expected summary/sentiment output.

Root Cause

conversation: "none" prevents the model from accessing prior messages, so the model falls back to the active assistant turn.
Realtime sessions do not implicitly attach history to out‑of‑band responses.
instructions alone cannot override missing context; the model needs an explicit conversation reference or injected messages.
The developer assumed the model automatically sees the session transcript, which is not true for out‑of‑band responses.

Why This Happens in Real Systems

Realtime APIs optimize for low‑latency streaming, not automatic history replay.
Out‑of‑band responses are treated as stateless unless you explicitly attach context.
Many engineers assume the model has a “global memory,” but Realtime isolates each response unless told otherwise.
The API design avoids accidental leakage of prior messages, so history is opt‑in.

Real-World Impact

Incorrect assistant behavior, such as repeating the next assistant turn.
Inability to generate summaries, analytics, or sentiment without manual history injection.
Hard-to-debug failures, because the model appears to “ignore” instructions.
Inconsistent behavior between default conversation responses and out‑of‑band responses.

Example or Code (if necessary and relevant)

Below is a minimal example of how senior engineers correctly inject context into an out‑of‑band response:

event = {
    "type": "response.create",
    "response": {
        "conversation": "none",
        "metadata": {"topic": "sentiment_update"},
        "input": [
            {
                "role": "system",
                "content": [{"type": "input_text", "text": conversation_history}]
            }
        ],
        "instructions": prompt,
        "output_modalities": ["text"]
    }
}
ws.send(json.dumps(event))

How Senior Engineers Fix It

Manually inject the conversation history into the input field of the response.create event.
Store the transcript in memory (RAM), not disk, and pass it back when needed.
Use server-side session state to maintain a rolling buffer of messages.
Avoid relying on conversation: "none" unless you explicitly provide all required context.
Validate the event payload to ensure the model receives the correct roles and content types.

Why Juniors Miss It

They assume Realtime behaves like Chat Completions, where history is automatic.
They misunderstand conversation: "none" as “use the current conversation,” when it actually means “use no conversation at all.”
They expect the model to infer context instead of explicitly providing it.
They rely on instructions alone, not realizing that instructions cannot compensate for missing message history.