OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoINFRASTRUCTURE
llama.cpp adds activations, control-vector steering
llama.cpp now exposes `/activations` endpoints in `llama-server`, letting users capture per-layer activations, stream per-token vectors to disk, and feed them into a sparse autoencoder workflow. The companion pipeline turns those features into GGUF control vectors for real-time steering and interpretability work.
// ANALYSIS
This is a genuinely useful bridge between mechanistic interpretability and a production-adjacent local serving stack, not just another notebook experiment.
- –The live capture API makes activation analysis practical inside the server you already run, instead of forcing a separate instrumentation stack.
- –The binary collection format and top-K mean view keep the data path simple enough to automate and scale.
- –The inter-cluster differential scoring is the smartest part here: it targets behavior-specific features, not just whatever lights up on a single phrase set.
- –The MoE scale caveat matters a lot; control vectors are powerful, but they are also model- and embedding-dimension-sensitive enough to require calibration.
- –For local model users, this opens a clean path from observability to intervention: collect, train, probe, export, steer.
// TAGS
llama-cppllmopen-sourceinferenceresearch
DISCOVERED
23d ago
2026-03-20
PUBLISHED
23d ago
2026-03-20
RELEVANCE
8/ 10
AUTHOR
wattswrites