MOLA debuts multi-LoRA serving on Apple Silicon

// 110d agoOPENSOURCE RELEASE

MOLA debuts multi-LoRA serving on Apple Silicon

MOLA is an alpha MLX-native multi-LoRA inference server for Apple Silicon that keeps one base model loaded and routes adapters per request. Its published benchmark on Qwen3.5-9B-MLX-4bit with 8 resident adapters shows mixed-adapter traffic stays usable, even as throughput drops under load.

// ANALYSIS

The interesting part here is less the feature than the portability gap it closes: CUDA stacks already made multi-LoRA serving feel normal, and MOLA makes that workflow plausible on Apple Silicon. The project still reads like serious infrastructure in progress, not a polished runtime, but the benchmark is strong enough to justify the experiment.

–On an Apple M5 Max 64GB, same-adapter vs mixed-adapter throughput is identical at concurrency 1 and only diverges once requests overlap, which is exactly the point where adapter routing starts to matter.
–The mixed-workload penalty is real, about 22% at concurrency 16 and 24% at 64, but that is a reasonable trade if it avoids reloading full fine-tuned checkpoints.
–The OpenAI-compatible API, per-request `model` selector, and runtime adapter hot-load/unload make it practical for local specialist workflows like Rust, SQL, and ops.
–The main blockers are also clear: a local `mlx-lm` patch is still required, KV cache reuse breaks when adapters switch mid-conversation, and the whole stack is Apple Silicon-only for now.

// TAGS

molainferenceopen-sourceself-hostedllmapibenchmark

DISCOVERED

110d ago

2026-03-25

PUBLISHED

110d ago

2026-03-25

RELEVANCE

8/ 10

AUTHOR

No_Shift_4543

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE1h ago

scroll-world launches scroll-driven 3D flight skill

scroll-world is an open-source, framework-agnostic agent skill that leverages Higgsfield to generate immersive, scroll-driven 3D camera flights through diorama scenes for landing pages. By rendering seamless connection clips between neighboring frames, it allows developers to build interactive 3D narrative websites navigated simply by scrolling, without requiring heavy game engines.

MODEL3h ago

OpenAI GPT-5.6 hits Amazon Bedrock

OpenAI's GPT-5.6 model family—including Sol, Terra, and Luna—is now generally available on Amazon Bedrock. Running on Bedrock's next-generation inference engine, the models support prompt caching with a 90% discount and match OpenAI's first-party pricing.

UPDATE4h ago

OpenRouter splits rankings by model weight

OpenRouter has updated its rankings platform by introducing separate leaderboards for open-weight and closed-weight models. This allows developers to track and compare usage statistics of proprietary, API-exclusive models against downloadable open-weight models.