OpenAI adds WebSocket mode to Responses API

// 127d agoPRODUCT UPDATE

OpenAI adds WebSocket mode to Responses API

OpenAI's Responses API now supports a persistent WebSocket mode that lets long-running agents send only incremental inputs and chain turns with `previous_response_id` instead of resending full context. For coding agents and orchestration loops with 20-plus tool calls, OpenAI says the feature can improve end-to-end latency by roughly 40% while remaining compatible with `store=false` and Zero Data Retention.

// ANALYSIS

This is the kind of plumbing upgrade that matters more than a flashy model bump for agent builders. OpenAI is turning the Responses API into something that behaves more like a real session transport for tool-heavy workflows, not just another stateless wrapper around inference.

–Persistent sockets plus incremental inputs attack a real bottleneck: repeated context transfer across every tool call in an agent loop
–The feature is squarely aimed at coding agents and orchestration systems, where latency compounds fast once a task fans out into dozens of model-tool turns
–Compatibility with `store=false` and Zero Data Retention is a big deal for teams running agents over private code or sensitive internal workflows
–The tradeoffs are real: one in-flight response per connection, no multiplexing, a 60-minute connection cap, and cache misses can still break continuation with `previous_response_not_found`
–Net effect: agent frameworks built on OpenAI can lean less on custom transport hacks and more on the platform's native continuation model

// TAGS

openai-responses-apiapiagentllmautomation

DISCOVERED

127d ago

2026-03-06

PUBLISHED

127d ago

2026-03-06

RELEVANCE

9/ 10

AUTHOR

Theo - t3․gg

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL12m ago

Qwythos-9B v2 fixes LLM repetition loops

Empero AI has launched the v2 hygiene release of Qwythos-9B, an open-source, 9-billion parameter reasoning model built on an uncensored Qwen3.5 base. This update addresses common local LLM repetition and tool-calling issues by employing Final-Token Preference Optimization to eliminate decoding loops under greedy settings and restoring the native multi-token prediction head.

OPEN SOURCE2h ago

meshoptimizer is an open-source C/C++ library that optimizes 3D triangle meshes to reduce file sizes and accelerate GPU rendering performance.

meshoptimizer is a high-performance C/C++ library designed to optimize 3D meshes for faster rendering and smaller file sizes. Developed by Arseny Kapoulkine, it provides a comprehensive suite of algorithms for vertex cache optimization, vertex fetch optimization, overdraw reduction, mesh simplification (Level of Detail), and data compression. The project includes gltfpack, an opinionated tool for optimizing glTF scenes, along with WebAssembly and JavaScript bindings for web applications, making it a staple in graphics pipelines and game engines.

UPDATE3h ago

Abacus AI integrates Supercomputer with agentic workflows

Abacus AI has integrated its Supercomputer with agentic workflows in Max Mode, giving LLMs like Fable 5 root access to a persistent Linux environment to execute, debug, and host full-stack applications autonomously.