OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoTUTORIAL
LM Studio crawls over LAN
A Reddit user says LM Studio runs Qwen3.5-35B at 20+ tokens/sec on a gaming rig locally, but drops to 3-5 tokens/sec with a minute-long first response when accessed from a laptop over LAN. The post asks whether the slowdown is in LM Studio, the network path, or the remote-client setup.
// ANALYSIS
This sounds less like the model itself and more like something in the remote-serving path: transport, client wrapper, or server-side prompt handling. LM Studio explicitly supports serving models over the network, so a drop this large is a red flag for configuration, not just “LAN latency.”
- –LM Studio docs say the app can serve OpenAI-compatible endpoints on the local network, and LM Link is built for remote-device access.
- –A minute-long first response points to first-token latency, model loading, or a reconnect/timeout loop more than raw token-generation speed.
- –If the laptop is going through a proxy, VPN, Docker bridge, or extra frontend, that extra hop can crush perceived responsiveness.
- –The post captures a common local-LLM reality: remote convenience only works well when the server stays resident and the API path is truly direct.
// TAGS
lm-studiollminferenceapiself-hosteddevtool
DISCOVERED
24d ago
2026-03-18
PUBLISHED
25d ago
2026-03-18
RELEVANCE
7/ 10
AUTHOR
chiliraupe