BACK_TO_FEEDAICRIER_2
LM Studio crawls over LAN
OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoTUTORIAL

LM Studio crawls over LAN

A Reddit user says LM Studio runs Qwen3.5-35B at 20+ tokens/sec on a gaming rig locally, but drops to 3-5 tokens/sec with a minute-long first response when accessed from a laptop over LAN. The post asks whether the slowdown is in LM Studio, the network path, or the remote-client setup.

// ANALYSIS

This sounds less like the model itself and more like something in the remote-serving path: transport, client wrapper, or server-side prompt handling. LM Studio explicitly supports serving models over the network, so a drop this large is a red flag for configuration, not just “LAN latency.”

  • LM Studio docs say the app can serve OpenAI-compatible endpoints on the local network, and LM Link is built for remote-device access.
  • A minute-long first response points to first-token latency, model loading, or a reconnect/timeout loop more than raw token-generation speed.
  • If the laptop is going through a proxy, VPN, Docker bridge, or extra frontend, that extra hop can crush perceived responsiveness.
  • The post captures a common local-LLM reality: remote convenience only works well when the server stays resident and the API path is truly direct.
// TAGS
lm-studiollminferenceapiself-hosteddevtool

DISCOVERED

24d ago

2026-03-18

PUBLISHED

25d ago

2026-03-18

RELEVANCE

7/ 10

AUTHOR

chiliraupe