Script makes tokens-per-second feel concrete
tokenspeed is a lightweight script and web demo for building intuition around LLM generation speed. It translates raw tokens-per-second numbers into a more human sense of how text, code, and reasoning+code actually feel while you wait. The goal is not to benchmark models, but to make performance claims easier to interpret in day-to-day local LLM use.
Useful because tokens/sec is objective but not intuitive, especially once you move beyond plain chat.
- –21 tokens/second is usually in the “feels responsive” range for plain text, though longer outputs still benefit from faster throughput.
- –10 tokens/second is not unusable; it is more “noticeably slow” than “broken,” and the delay becomes more obvious on code and reasoning tasks.
- –The strongest value here is calibration: it helps people compare claims across workloads instead of arguing from raw numbers alone.
DISCOVERED
1h ago
2026-05-10
PUBLISHED
2h ago
2026-05-10
RELEVANCE
AUTHOR
MikeNonect
