llama-swap stumbles on MLX model names

// 2h agoINFRASTRUCTURE

llama-swap stumbles on MLX model names

This Reddit post asks whether llama-swap can reliably front mlx_lm.server on an M2 Max, since the server works directly but the proxy setup stalls after loading. The error suggests the alias qwen35-27b-mlx is being treated like a Hugging Face repo id, so the problem looks more like configuration than MLX compatibility.

// ANALYSIS

The hot take is that this looks more like a config mismatch than proof that llama-swap cannot work with MLX.

–The direct `mlx_lm.server` path already works, so the backend and `/v1/chat/completions` compatibility are not the problem.
–The failure is a Hugging Face repo resolution error for `qwen35-27b-mlx`, which suggests the proxy is passing the alias where MLX expects a real model path or repo id.
–llama-swap is designed around wrapping OpenAI-compatible servers, so MLX should be possible in principle, but only if the upstream command and model identifier are wired the way MLX expects.
–As written, this is a good troubleshooting report for Apple Silicon users, but it does not establish a working MLX + llama-swap setup yet.

// TAGS

llama-swapmlxmlx-lmmacosapple-siliconlocal-firstopenai-compatibleproxy

DISCOVERED

2h ago

2026-05-09

PUBLISHED

6h ago

2026-05-09

RELEVANCE

5/ 10

AUTHOR

No_Algae1753

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE54m ago

Claude limit bump fuels weekly-cap doubts

Anthropic says it doubled Claude Code’s five-hour limits for Pro, Max, Team, and seat-based Enterprise plans, and removed peak-hour reductions for Pro and Max. The reaction centers on whether heavy Max users will just hit opaque weekly caps faster.

UPDATE1h ago

Higgsfield MCP powers Claude content factory

Higgsfield’s MCP connector plugs Claude into its image and video generation stack, with workflow pieces like Virality Predictor and Ad Reference folded in. The pitch is agent-native creative production: generate, score, and remix content without leaving the chat.

TUTORIAL2h ago

Blade Ballet shares anime storyboard workflow

Blade Ballet is a prompt-share post that shows how to generate a rough 16-panel anime combat storyboard with GPT Image 2 and then animate it in Seedance 2.0 using the storyboard as sequential keyframe guidance. The thread is aimed at creators who want tightly paced, cinematic fight choreography without starting from a full character reference or traditional animatic pipeline.