SmolLM2-135M shows unusual steerability in KV-cache test
This Reddit post is an anecdotal benchmark of Hugging Face’s SmolLM2-135M-Instruct, a 135M-parameter on-device language model trained on 2T tokens. The author claims that with logit steering and KV-cache constraints, the model stays surprisingly consistent even without a system prompt or hidden context, suggesting small models may be more controllable than expected when inference-time guidance is carefully engineered.
Hot take: this is less a product launch and more a proof-of-concept that small models can be nudged into stable behavior with inference-time controls.
- –The underlying model is real and official: SmolLM2-135M-Instruct is part of Hugging Face’s SmolLM2 family.
- –The post’s main signal is controllability, not raw capability; logit steering before sampling can materially shape outputs.
- –The claims are anecdotal and not presented as a formal benchmark, so treat the result as directional rather than conclusive.
- –The “what could it do with billions of tokens” line is speculation; the stronger takeaway is that architecture plus decoding control can matter a lot at small scale.
DISCOVERED
45d ago
2026-04-26
PUBLISHED
45d ago
2026-04-25
RELEVANCE
AUTHOR
shamanicalchemist