OPEN_SOURCE ↗
REDDIT · REDDIT// 21d agoVIDEO
vLLM powers local Claude Code swarm
A Reddit video shows a Linux workstation running vLLM locally with Claude Code and gpt-oss-120b, turning four agents into a fully offline coding swarm. The poster says the setup can scale to eight concurrent agents, with throughput becoming the main tradeoff.
// ANALYSIS
Local agent stacks are starting to look like real infrastructure instead of a hobbyist stunt. vLLM is the key enabler here: it gives Claude Code a localhost backend, parallel inference, and the kind of batching behavior that makes multi-agent work feel practical.
- –vLLM's OpenAI-compatible server and Docker path make it easy to plug into existing agent tooling without rewriting workflows.
- –OpenAI's gpt-oss-120b is built for efficient local inference and tool use, which fits the offline, self-hosted setup.
- –The bottleneck shifts from "can I run this?" to "how many agents can my GPU keep fed?" which is the right problem to have.
- –This is a strong argument for Linux plus local inference if you care about privacy, latency, or avoiding per-request cloud costs.
// TAGS
vllmclaude-codegpt-ossai-codingagentinferencegpuself-hosted
DISCOVERED
21d ago
2026-03-22
PUBLISHED
21d ago
2026-03-22
RELEVANCE
8/ 10
AUTHOR
swagonflyyyy