Local Legal Stack Goes MoE

// 46d agoINFRASTRUCTURE

Local Legal Stack Goes MoE

A lawyer updated a self-hosted legal drafting system built around 12 V100s, a second GPU box, and llama.cpp after moving away from vLLM for the models he actually wants to run. The stack now routes drafting, reasoning, review, and cite verification across pinned local models to keep hallucinations out of final documents.

// ANALYSIS

The real story is not the hardware flex; it’s that MoE finally made the local setup usable for real drafting work, while dense models on Volta stayed too slow to justify their footprint.

–llama.cpp won because the target workload is MoE GGUFs on V100, and the relevant bottleneck is kernel support plus memory behavior, not just raw GPU count
–The throughput gap is stark: the author reports MoE models in the 50-113 tok/s range, while dense 27B-32B models land below the practical floor
–The pipeline is doing the important work: a router, a gate model, an adversarial reviewer, and a verifier for cites/dates/Bates numbers matter more than any single model choice
–The self-poisoning bug is the cautionary lesson here; if your RAG context includes prior outputs, the system will confidently ground on its own slop
–Keeping the 122B model around is defensible as a high-stakes quality tier, but the 35B MoE looks like the sensible default for routine work

// TAGS

llmmoelong-contextinferencegpuragself-hostedlocal-legal-drafting-stack

DISCOVERED

46d ago

2026-05-26

PUBLISHED

46d ago

2026-05-25

RELEVANCE

8/ 10

AUTHOR

TumbleweedNew6515

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE54m ago

Terminal Control is an open-source tool that enables AI coding agents to control, test, and capture real terminal applications through pseudo-terminals.

Terminal Control provides a Rust-based command-line interface and a TypeScript client library that allow external drivers, such as AI agents and automated testing suites, to interact directly with Terminal User Interfaces (TUIs). By offering a real pseudo-terminal environment, it overcomes the limitations of parsing plain text output, enabling precise keystroke injection, screen capture, timeline recording, and extraction of structured visual states like SVG and JSON.

NEWS1h ago

Greptile supports OSS with free accounts

The creator of the open-source repository claude-code-templates shared positive feedback on using Greptile for automated pull request reviews. Supported by a free open-source software (OSS) account from the Greptile team, the maintainer integrated the tool into incoming PRs, where it successfully generated diagrams of the code changes and left detailed reviews that caught real issues.

MODEL1h ago

LingBot-VA 2.0 launches robot control model

Developed by Robbyant under Ant Group, LingBot-VA 2.0 is a video-action foundation model built from scratch for native robot control. It employs a causal Mixture-of-Experts architecture and consistency distillation to reduce control loop latency to 142 ms.

Local Legal Stack Goes MoE